﻿ Public discourse semantics A method of anticipating economic crisis D G^fu D Cristea Daniela G^fu Alexandru Ioan Cuza University of Iasi, Faculty of Computer Science 16, General Berthelot St , 700483 Iasi, Romania E-mail: daniela gifu@info uaic ro Dan Cristea 1 Alexandru Ioan Cuza University of Iasi, Faculty of Computer Science 16, General Berthelot St , 700483 Iasi, Romania E-mail: daniela gifu@info uaic ro 2 Romanian Academy - the Iai branch, Institute for Theoretical Computer Science 2, T Codrescu St , 700481 Iasi, Romania E-mail: dcristea@info uaic ro Abstract This paper provides a proof that anticipation of an economic crisis by analysing public discourses (in particular, speeches on economic issues) is feasible It proposes a method of text classiﬁcation and semantic interpretation based on natural language processing tech- niques that could be used to trace, over a period of time, the print press discourses, with the aim to valuate the perspective of occurrence of crises Classiﬁcation is the task of assigning tags (words, expressions) to the texts that make up a corpus In our case, we were inter- ested to identify among the texts under scrutiny those belonging to classes likeﬁnancial, economic,nationalism, etc This approach is sustained by the fact that public discourses can be characterized from a rhetorical perspective, depending on the speciﬁc strategies their authors have chosen: orientation to change opinions or to determine action, ratio between rational (logos) and emotional (pathos, etc We are proposing an automatic analysis of the content of the public language, by using quantitative measures Our purpose was to develop a computational tool able to oﬀer to researchers in the economic, social or political sciences, but, not less, to the public at large, the possibility to measure the acuity of diﬀerent accents of a written public discourse (ﬁnancial, emotional, etc ), as mean to anticipate the threat of ﬁnancial accidents Such a tool could help the processes of decision making in the analysis of crisis Although our analysis used as data the journalistic and economic environments of Romania, it could easily be extrapolated to other languages/countries Keywords:public language, text categorization, semantic analysis, economic crisis 1 Introduction In the atempt to divulge ante-factum crises in public discourse, primarily the voices of those entities must be listen to which are most inﬂuencial on the ﬁnancial and economic domains These entities, clearly, are: The Romanian National Bank (in the internal context) and the World Bank (in the international context)1 The voices of these entities are best listen to in the 1 "In times of internal or international crisis ( ), we talk about managing various symbolic aspects of the role of: guardian of institutcumions, guarantor of national unity, moderator INT J COMPUT COMMUN public speeches of governors and, in many cases, of journalists specialised on economic-ﬁnancial issues A public discourse arguing on some extremely important moment-related issue is, most of time, an amalgam of arguments, rational forms, descriptions, stylistic procedures, which are intended to inform or to prepare a receptor in front of a problematic reality But, as close to the subject a discourse would be, often it hides, in subtle ways, the true nature of the subjective thinking of the emitter For instance, an exaggerated trust in the fresh energy of the society, in the beneﬁts the loans on mortgage could bring to ordinary people, on the exceptional rise of the rate of interests, or on the incredible high bonuses certain banks are oﬀering to their highly ranked employees could simultaneously bring the negative news, that something wrong is in the air, that a crisis is insinuating Decoding of this hidden message, which is most of the time transmitted unintentionally, could be done only by someone extremely sensible to all facets of the ﬁnancial and economic life Signals for economic crises are issued by the central banks (e g Federal Reserve System, Central Bank of U S , European Central Bank, etc ) During the period 2001-2008, when the banking system issued large but artiﬁcially cheap credits, there have been many public rhetoric appearances favourable to this behaviour which tried to set up an economic development investment with questionable prudence Slogans like "a home for every American" (U S ) or "credit with only an ID" (Romania), addressing a wide range of borrowers but having extremely low interest rates, could have been taken as signals of an economic potential crisis In this study we address the question:Can an economic crisis be anticipated by evaluating public discourses from a lexical-semantic perspectives? We are interested to pursue a content analysis of the public language, using for that inves- tigation tools that belong to the domain of natural language processing (NLP) and addressing: vocabulary (key words, frequent words), semantics (classes of concepts arranged in a hierarchy) and rhetorical-pragmatic discursive strategies (presence of the person I, preference for vague statements, generalities, etc ) In U S , the tradition of quantitative analysis is very strong, its roots being deﬁned by Lasswell In Europe the interest grew more towards theoretical investigation of the semiotics of discourse ( , , ) Modern content analysis is not only an illustration of a theory of text, but, should be rooted on empirical data On the other hand, the American analysis is often neutral, technical, comparative, while the European analysis (especially the Critical Discourse Analysis model2) has a critical component and a strong enough ethicist In the perspective of our study, we are interested on public discourses (speeches), in written form, given by specialists on economy or by journalists, on economic issues It is known that economy crises succeeds either a period of economic thrive or, as happened recently, a previous crisis In our investigation we have used texts produced by most pertinent spokesmen which appeared in press materials issued by the Romanian National Bank (BNR), the most legitimate voice on economy issues in Romania The other major ﬁlter in selecting the texts that should populate our corpus was the economic context (e g economic stability vs economic crisis) A text categorization application ﬁltered a stream of news that was considered of interest for our research Some of the topics of interest have been: "credit ID only", "real estate boom", "mortgages", and "transactions with land or housing" On another hand, at the base of our quantitative investigation was laid a lexical-semantic database In order to assure generality, in acquiring it we had to use rather neuter sources, not necessarily tight to our speciﬁc corpus of texts As such, the lexicon and the semantic 2 "Critical theories, thus also CDA, are aﬀorded special standing as guides for human action They are aimed at producing enlightenment and emmancipation Such theories seek not only to describe and explain, but also to root out a particular kind of delusion Even with diﬀering concepts of ideology, critical theory seeks to create awareness in agents of their own needs and interests " classes have been collected from diﬀerent sources usually dealing with economy themes: the BNR publications, already mentioned, but also a collection of dailies, Ziarul ﬁnanciar, Curierul National, Bursa, that have been monitored for a long period of time Current empirical approaches in analysing the public language put at work NLP techniques, by which a multitude of features of the discourse were extracted and interpreted The domain of NLP includes a theoretically motivated range of computational techniques for analyzing and representing naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications In this paper we describe a platform (Discourse Analysis Tool- DAT) specialised in the interpretation of the public discourse, which integrates a range of language processing tools with the intent to build complex characterisations of the public discourse The idea behind it is that the vocabulary betrays discursive tonalities, this way allowing interpretations over the speakers orientation The paper is structured as follows Section 2 shortly describes the previous work Sec- tion 3 presents the DAT software and section 4 discusses an example of comparative analysis of economic discourses, elaborated during one year (2007-2008) Finally, Section 5 highlights interpretations anchored in our analysis and presents conclusions 2 Previous work The aim of an interdisciplinary approach such as analysing the language of public speeches is to deﬁne and explain diﬀerent discursive contexts, in our case, reﬂected in the print media The studies in this direction have mainly concentrated on three tasks The ﬁrst had to do with a cognitive side and, often, with an emotional side, of how humans acquire, produce, and understand language The second aimed at understanding the relationship between the linguistic utterance and the world, and the third - at understanding the linguistic structure of the language as a communication device Linguistics has usually treated language as an abstract object which can be accounted for without reference to social or political concerns of any kind As we will see, one aspect of the platform that we present touches a lexical-semantic function- ality, which has some similarities with the approach used in Linguistic Inquiry and Word Count (LIWC), an American software used to analyse the elections in United States in 2008 There are, however, important diﬀerences between the two platforms LIWC-2007 basically counts words and increments counters associated with their declared semantic classes DAT performs part-of-speech (POS) tagging and lemmatization of words The lexicon contains a collection of lemmas (over 8800) for the POS categories of verb, noun, adjective and adverb, each being associated with one or more semantic classes In the context of the lexical semantic analysis, the pronouns, numerals, prepositions and conjunctions, considered to be semantically empty, have been left out Then, a special section of the lexicon includes expressions An expression is deﬁned as a sequence of stems of words DAT includes now 33 semantic classes, chosen to ﬁt optimally with the necessities of interpreting the public discourse, ﬁve of them having been added recently (failures,nationalism,moderation,firmness,spectacular) Then, another range of diﬀerences between the two platforms regards the user interface In DAT, the user is served by a friendly interface, oﬀering a range of services: opening and displaying one or more ﬁles, editing and saving the text, functions of undo/redo, functions of editing the lexicon, vi- sualization of the mentioning of occurrences of certain semantic classes in the text, etc The menus oﬀers a whole range of output visualization functions, from the tabular form to graphical representations and to printing services And ﬁnally, and most importantly, to help the user to interpret diﬀerent authors simultaneously, she/he can chose among a collection of formulas that facilitate comparative studies INT J COMPUT COMMUN Figure 1: The DAT interface: in the left window appear the selected ﬁles, in the middle window - the text from the selected ﬁle, and in the right window, information about the text (language, word count, dominant class, etc ) Bellow, a plot chosen from a range of graphical styles is displayed By selecting a speciﬁc class in the middle window, all words assigned to that class are highlighted in the text 3 The DAT platform The Discourse Analysis Tool (DAT, currently at version 3) considers the public discourse from two perspectives: lexical and semantic We describe shortly our platform which integrates a range of language processing tools, with the intent to build complex characterisations of the public discourse The concept behind this method is that the vocabulary used by a speaker betrays the authors sensibility, her/his level of culture, her/his cognitive world, and, by this, to the semantic spectrum of her/his speeches, while the syntax may reveal the level of culture, intentional persuasive attitudes towards the public, etc Some of these means of expression are intentional, aimed to deliver a certain image to the public, while others are unintentional Figure 1 shows a snapshot of the interface showing a semantic analysis, during a working session To display the results of the lexical-semantic analysis, the platform incorporates two alternative views: graphical (pie, function, columns and areas) and tabular (Microsoft Excel compatible) The vocabulary of the platform covers 33 semantic classes (swear,social,family,friends, people,emotional,positive,negative,anxiety,anger,sadness,rational,intuition, determine,uncertain,certain,inhibition,perceptive,see,hear,feel,sexual,work, achievements,failures,leisure,home,financial,religion,nationalism,moderation, firmness,spectacular), considered to fulﬁl optimally the necessity of interpreting the public discourse in diﬀerent contexts Some of these categories are placed in a hierarchical relation Linguistic processing begins by tokenization, part of speech tagging and lemmatization Only the words belonging to the lexicon are considered relevant and therefore count in establishing the weights of diﬀerent semantic classes In response to the text being sent by the user, the system returns a compendium of data which includes: the language of the document, the number of words, and the type of discourse detected, a unique identiﬁer (usually the ﬁle name), and a Table 1: Examples of phrases on economy issues, on BNR editorials ClassesOriginal in RomanianEnglish equivalent cresterea PIB, expansiuneaPIB growth, global economic economiei mondiale, investitii,growth, investments, unemploy- ﬁnancialpositivescaderea ratei somajului, expan-ment has declined, economic siunea economicagrowth moderarea ritmului de crestere a salariilor, gradul de incerti-moderate the wage growth, un- negativetudine, turbulente pe pietele ﬁ-certainty, ﬁnancial markets tur- nanciare, efectul inhibitor asupramoil, dampening impact on con- consumului si investitiilorsumption and investment report of the lexical-semantic analysis Our interest went mainly in determining those discursive attitudes able to betray an ap- proaching recession But the system can be parameterised to ﬁt also other conjunctures: the user can deﬁne at will her/his semantic classes, which, as indicated, are partially placed in a hierarchy Thus, for example, for the lemma economist, the following classes are assigned: 2 = social and 5 =people The classpeople, is a subclass of the classsocial These classes and their hierarchy are deﬁned in a XML-like manner: Whenever an occurrence belonging to a lower level class is detected in the input ﬁle, all counters in the hierarchy, from that class to the root, are incremented In other words, the lexicon assigned to superior classes includes all words/lemmas of its subclasses 4 A comparative study 4 1 The corpus The corpus used for our investigation was conﬁgured to allow a comparative study over the discursive characteristics of economic-ﬁnancial themes, by including economy texts published on the BNR site in three diﬀerent periods: 1 April-June 2007, when Romania crossed a period of economic stability 2 April-June 2008, when Romania was near the economic crisis 3 July 2008, when the Romanian president declared the economic recession Table 1 presents examples of phrases in the economy domain that exhibit two diﬀerent discourse moods: positive emotional and negative emotional The analyzed texts were essentially dealing with the topics social and ﬁnancial After processing the texts with the DAT software, the following classes proved to have preponder- ant occurrences:financial,social,work,emotional(positiveandnegative),rational (intuition,determine,uncertain,certain and inhibition) andnationalism To stress the distinguishing features, only these classes were ﬁnally left on the graphics 4 2 The lexical-semantic analysis We show in this section the results outputted by DAT when analysing the streams of textual data belonging to the three sections of the corpus (presented in section 4 1) For that, we have INT J COMPUT COMMUN Figure 2: Diﬀerence between the occurrence of semantic classes in BNR editorials: one year before the economic recession versus three months before used the DAT feature of performing comparative studies The values are supposed to reﬂect correctly the indicated classes, because they were computed by averaging on the whole collections of texts, not just a single text The graphics considered for the interpretation computed one-to- one diﬀerences, as given by Formula 1, included in the DAT Mathematical Functions Library: Dif f11=average(x) average(y) (1) x;y wherexandyare two streams;average(x) andaverage(y) are the average frequencies ofx andyover the whole stream, and the diﬀerence is computed for each selected class Since a diﬀerence can lead to both positive and negative values, these particular graphs should read as follows: values above the horizontal axis are those prevailing at the ﬁrst element more than at the second element, and those below the horizontal axis show the reverse prominence A zero value indicates equality Our experience showed that values below the threshold of 0 5% should be considered as irrelevant and, therefore, were ignored in the interpretation So, the graphical representation in Figure 2, in which the editorials (Apr -Jun 2007) are compared against the editorials (Apr -Jun 2008) should be interpreted as follows: in 2007 the BNR discourse was extremely optimistic (high diﬀerence values of the classpositive) and they were giving high importance to Romanian speciﬁc aspects (classnationalism), while in 2008 (nearly recession time) the BNR discourse had become rather pessimistic (classnegative) and speculative (classintuition) with respect to the Romanian economic future In the following we will compare the same 2007 discourse against their discourse immediately after the recession The graphical representation in Figure 3, in which the editorials (Apr -Jun 2007) are com- pared against the editorials (July 2008) should be interpreted as follows: the diﬀerence in opti- mism between the BNR discourse one year before the recession and that of the moment the crisis was oﬃcially declared (classpositive) is more pregnant (1 25% here versus 0 89% in Figure 2) However, although the pessimistic tone (classnegative) is more pronounced in July 2008 than in the period of stability, it has weakened in intensity We could say that BNR is caution to push too much on the distress pedal, because its voice could inﬂuence the ﬁxing and, by that, worsen the ﬁnancial market even more Moreover, BNR is oﬀering a possible immediate solution, by accenting on the job sphere (classwork) Figure 3: Diﬀerence between the occurrence of semantic classes in BNR editorials: one year before the economic recession versus one month after the economic recession 5 Conclusions and Future Work In this paper we presented a quantitative method and an application that strengthen the idea that crises can be anticipated by monitoring public speeches produced by representative entities We are aware that some of the diﬀerences which we have evidenced in our comparative study should partially be attributed to idiosyncratic rhetorical styles However, when the traits inventoried acquire the regularities of patterns, then they could be used as measure apparatuses and, properly used, could emit useful signals to a receptive society There are a number of ways in which we think our research could be continued First, we want to add new features to the platform, with a special emphasis on the syntactic and rhetorical levels of analysis The new release of DAT should help the user to identify and count patterns of use at the syntactic and rhetorical level Another line to be continued regards the evaluation metrics, which have not received enough attention till now We are currently studying other statistical metrics able to give a more comprehensive image on diﬀerent facets of the public discourse A weakness of the present system is the fact that the unequal sizes of the lexicons characteristic to semantic classes can inﬂuence the decisions: the more entries in the lexicon a certain class contains, the higher its inﬂuence could be foreseen To this problem, the solution is not to balance the classes in their number of entries, because the language makes them intrinsically unequal, but to ﬁnd calibration techniques that bring their values on equivalent ranges, irrespective of the dimensions of the lexicons Lets note that in the present study we have counterbalanced somehow this skew by using the diﬀerence-based formulas (and thus avoiding absolute values) Surely, the problem of characterising public speeches receives no ﬁnal solution with our approach We believe, however, that our method sheds an interesting light on possibilities of automatically interpreting discourses and, equally, it opens new perspectives AcknowledgementsIn pursuing this research the authors had partial support from the projects POSDRU-63663, ICT-PSP 250467-ATLAS and ICT-PSP 270893-Metanet4U References Burger, C ,Textanalyse als Ideologiekritik, Zur Rezeption zeitgenossischer Unterhaltungslit- eratur, Frankfurt am Main, Athenaum, 1973 INT J COMPUT COMMUN Cristea, D , Raschip, M , Forascu, C , Haja, G , Florescu, C , Aldea, B , Danila, E ,The Digital Form of the Thesaurus Dictionary of the Romanian Language, in Proceedings of SpeD 2007 Speech Technology and Human-Computer Dialogue, Iasi, May 10-12, 2007 Gerstle, J , Comunicarea politica, trad Gabriela Camara Ionesi, Institutul European, Iasi, 94, 2002 G^fu, D , Cristea, D , Computational Techniques in Political Language Processing: AnaDiP- 2011, in J J Park, L T Yang, and C Lee (Eds ), FutureTech 2011, Part II, CCIS 185, 188195, 2011 Lasswell, H D , Politics: Who Gets What, When, How, McGraw-Hill, New York, 1936 Lazarsfeld, P F , Berelson, B , Hazel, G ,The Peoples Choice: How the Voter Makes up His Mind in a Presidential Campaign, 3d ed , New York, Columbia University Press, 1944 Perelman, C , Olbrechts-Tyteca, L , Traite de l'argumentation,Ed de l'Institut de Sociolo- gie de l'Universite Libre de Bruxelles, 72, 1972 Plett, H F ,Stiina textului si analiza de text, trad Speranta Stanescu, Ed Univers, Bu- curesti: 72, 1983 Romaine, S ,Language in society An Introduction to Sociolinguistics, Oxford University Press Inc , New York, 1994 van Dijk, T A , Semantique generale et theorie des textes, Linguistics, 62, 66-95, 1970 Wodak, R , Critical Linguistics and Critical Discourse Analysis, Handbook of Pragmatics, Benjamins, 2006 Daniela G^fu(b January 10, 1973) received her M Sc in Communication and PR (2004) and got a PhD in Philosophy (2010) from the "Alexandru Ioan Cuza" University of Iasi Now she has got a temporary possision of associate professor of Communication at Journalism at the Faculty of Letters of the "Alexandru Ioan Cuza" University of Iasi and is pursuing a postdoctoral research in the Faculty of Computer Science at the same University Her current research interests include diﬀerent aspects of Natural Language Processing in corellation with discourse analysis She has (co-)authored 5 books and more than a dozens of articles, has more than 20 conferences participation She is a Honorary Citizen of the village of Balabanesti, county of Galati She is a member of the Canadian Association of Romanian Writers Dan Cristea (b December 16, 1951) received a M Sc in Computer Engineering (1975) from the University Politehnica of Bucharest, a M Sc in Mathematics (1981) from the "Alexandru Ioan Cuza" University of Iasi and a PhD degree in Computer Engineering (1994) from the University Politehnica of Bucharest He holds a position of full profes- sor in the Department of Computer Science of the "Alexandru Ioan Cuza" University of Iasi and is a principal researcher at the Institute for Theoretical Computer Science of the Romanian Academy, the Iasi branch He is known for his researches related to the struc- ture of discourse (Veins Theory), anaphora resolution (the system RARE), the Romanian WordNet, summarisation, etc He has got the "Grigore Moisil" reward of the Romanian Academy in the Information Technology section and is a correspondent member of the Romanian Technical Academy 